Aesthetics & Scales (with Pokémon)

DSST 289: Introduction to Data Science

Erik Fredner

2024-08-26

Outline

  • Aesthetics & scales: so what?
  • Aesthetics & scales with Pokémon
    • pokemon data
    • geom_point
    • geom_text & label
    • scale_ & limits
    • n.breaks
    • color
    • scale_color
    • size
    • shape
    • facet_ing plots

Aesthetics: so what?

  • Aesthetics (such as color, size, shape, etc.) determine how data points are visually distinguished in a plot.
    • Choosing the right aesthetics ensures that the visualization communicates the correct message.
    • For example, this would be confusing in US politics!
      • Democrats vs. Republicans

Scales: so what?

  • Scales control how data is mapped onto visual dimensions like the x- and y-axes.
    • This affects how easily readers can interpret the visualization.
    • Proper scaling can prevent misleading representations.

pokemon data

Code
pokemon <- read_csv("../data/pokemon.csv")

# take a look at the data:
pokemon
# A tibble: 1,194 × 13
   pokedex_no name       form  type_1 type_2 stat_total    hp attack defense
        <dbl> <chr>      <chr> <chr>  <chr>       <dbl> <dbl>  <dbl>   <dbl>
 1          1 Bulbasaur  <NA>  Grass  Poison        318    45     49      49
 2          2 Ivysaur    <NA>  Grass  Poison        405    60     62      63
 3          3 Venusaur   <NA>  Grass  Poison        525    80     82      83
 4          4 Charmander <NA>  Fire   <NA>          309    39     52      43
 5          5 Charmeleon <NA>  Fire   <NA>          405    58     64      58
 6          6 Charizard  <NA>  Fire   Flying        534    78     84      78
 7          7 Squirtle   <NA>  Water  <NA>          314    44     48      65
 8          8 Wartortle  <NA>  Water  <NA>          405    59     63      80
 9          9 Blastoise  <NA>  Water  <NA>          530    79     83     100
10         10 Caterpie   <NA>  Bug    <NA>          195    45     30      35
# ℹ 1,184 more rows
# ℹ 4 more variables: sp_attack <dbl>, sp_defense <dbl>, speed <dbl>,
#   generation <dbl>

Aesthetics & Scales with Pokémon

The highest defense and hp is in the top-right:

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp))

Modifying scales

Let’s suppose we wanted to flip that and see the Pokemon with the highest defense and lowest hp in the top-right corner.

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  # reverse the y-axis
  scale_y_reverse()

Combining scale_, aes, & geom_

Who has low hp and high defense?

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  # new:
  geom_text(aes(x = defense, y = hp, label = name))

Limiting scales

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  # repel the text labels:
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # limit the x-axis to `defense` of 150 or more:
  # `NA` ("Not Available") is a missing value indicator.
  # We use it here to say that there is no upper limit on the x-axis.
  scale_x_continuous(limits = c(150, NA))

Increasing n.breaks

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # make it easier to identify the precise values of `defense`:
  scale_x_continuous(limits = c(150, NA), n.breaks = 30)

Color

  • We can use color to see patterns in the data by variables
  • e.g., Are there relationships between type_1, defense, and hp?
  • We’re also going to filter for first generation Pokemon to reduce the number of points.

Color by type_1

Code
pokemon |>
  filter(generation == 1) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = type_1)) +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Custom color

Let’s use colors associated with 🔥, 🍃, and 💧 Pokemon:

Code
pokemon |>
  filter(generation == 1) |>
  # filter for just a few types
  filter(type_1 %in% c("Water", "Fire", "Grass")) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = type_1)) +
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # use the `type_1` colors instead of the default:
  scale_color_manual(values = c(
    Water = "blue",
    Fire = "red",
    Grass = "green"
  ))

scale_color

Mewtwo has a high stat_total:

Code
pokemon |>
  filter(generation == 1) |>
  ggplot() +
  # color the points by `stat_total` instead of `type1`:
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  # use the `viridis` color palette instead of the default:
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

size

Magikarp has a low stat_total:

Code
pokemon |>
  filter(generation == 1) |>
  # just water pokemon
  filter(type_1 == "Water") |>
  ggplot() +
  # new: `size` by `stat_total`
  geom_point(aes(x = defense, y = hp, size = stat_total)) +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Combine size and color

Code
pokemon |>
  filter(generation == 1) |>
  # just psychic pokemon
  filter(type_1 == "Psychic") |>
  ggplot() +
  # new: `color` by `stat_total`, too
  geom_point(aes(x = defense, y = hp, size = stat_total, color = stat_total)) +
  # use the `viridis` color palette instead of the default:
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Combining color and shape

Code
pokemon |>
  # filter for first gen
  filter(generation == 1) |>
  # filter for a few types
  filter(type_1 %in% c("Normal", "Rock", "Bug", "Poison")) |>
  ggplot() +
  geom_point(aes(
    x = defense,
    y = hp,
    # new: shape points by `type_1`
    shape = type_1,
    # color points by `stat_total`
    color = stat_total
  )) +
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

facet_ing plots

Code
# faceting allows us to split a plot into multiple panels based on a factor
pokemon |>
  filter(generation == 1) |>
  filter(type_1 %in% c("Normal", "Rock", "Bug", "Poison")) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  scale_color_viridis_c() +
  # new: `~` means "by", so we are saying "facet wrap by `type_1`"
  facet_wrap(~type_1) +
  # note that the scales of the plots are all the same
  # this makes them directly comparable
  geom_text_repel(aes(x = defense, y = hp, label = name))

Bonus 1: geom_smooth

Code
# geom_smooth adds a smoothed line to plots
# what is the general relationship between hp and defense?
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  scale_color_viridis_c() +
  # new: add a smoothed line to the plot
  geom_smooth(aes(x = defense, y = hp))

Bonus 2: facet_ everything

Code
pokemon |>
  # we are "pivoting" the data from wide to long format for ease of plotting
  # we will discuss this more later!
  pivot_longer(
    cols = c(attack, sp_attack, defense, sp_defense, speed),
    names_to = "stat_category",
    values_to = "stat_value"
  ) |>
  ggplot() +
  geom_point(aes(x = stat_value, y = hp, color = stat_total)) +
  geom_smooth(aes(x = stat_value, y = hp)) +
  scale_color_viridis_c() +
  # new: facet by `stat_category`, a column we created with `pivot_`
  facet_wrap(~stat_category)

Summary

  • Aesthetics determine how data points are visually distinguished, including aspects like color, size, and shape.
  • Scales control how data is mapped onto visual dimensions such as x- and y-axes. Proper scaling ensures that visualizations are easy to interpret and not misleading.
  • Manipulating both aesthetics and scales can reveal patterns and/or outliers in data.